14 research outputs found
Towards Scene Understanding with Detailed 3D Object Representations
Current approaches to semantic image and scene understanding typically employ
rather simple object representations such as 2D or 3D bounding boxes. While
such coarse models are robust and allow for reliable object detection, they
discard much of the information about objects' 3D shape and pose, and thus do
not lend themselves well to higher-level reasoning. Here, we propose to base
scene understanding on a high-resolution object representation. An object class
- in our case cars - is modeled as a deformable 3D wireframe, which enables
fine-grained modeling at the level of individual vertices and faces. We augment
that model to explicitly include vertex-level occlusion, and embed all
instances in a common coordinate frame, in order to infer and exploit
object-object interactions. Specifically, from a single view we jointly
estimate the shapes and poses of multiple objects in a common 3D frame. A
ground plane in that frame is estimated by consensus among different objects,
which significantly stabilizes monocular 3D pose estimation. The fine-grained
model, in conjunction with the explicit 3D scene model, further allows one to
infer part-level occlusions between the modeled objects, as well as occlusions
by other, unmodeled scene elements. To demonstrate the benefits of such
detailed object class models in the context of scene understanding we
systematically evaluate our approach on the challenging KITTI street scene
dataset. The experiments show that the model's ability to utilize image
evidence at the level of individual parts improves monocular 3D pose estimation
w.r.t. both location and (continuous) viewpoint.Comment: International Journal of Computer Vision (appeared online on 4
November 2014). Online version:
http://link.springer.com/article/10.1007/s11263-014-0780-
Unsupervised Activity Segmentation by Joint Representation Learning and Online Clustering
We present a novel approach for unsupervised activity segmentation, which
uses video frame clustering as a pretext task and simultaneously performs
representation learning and online clustering. This is in contrast with prior
works where representation learning and clustering are often performed
sequentially. We leverage temporal information in videos by employing temporal
optimal transport. In particular, we incorporate a temporal regularization term
which preserves the temporal order of the activity into the standard optimal
transport module for computing pseudo-label cluster assignments. The temporal
optimal transport module enables our approach to learn effective
representations for unsupervised activity segmentation. Furthermore, previous
methods require storing learned features for the entire dataset before
clustering them in an offline manner, whereas our approach processes one
mini-batch at a time in an online manner. Extensive evaluations on three public
datasets, i.e. 50-Salads, YouTube Instructions, and Breakfast, and our dataset,
i.e., Desktop Assembly, show that our approach performs on par or better than
previous methods for unsupervised activity segmentation, despite having
significantly less memory constraints.Comment: Preprint. Under revie
Permutation-Aware Action Segmentation via Unsupervised Frame-to-Segment Alignment
This paper presents an unsupervised transformer-based framework for temporal
activity segmentation which leverages not only frame-level cues but also
segment-level cues. This is in contrast with previous methods which often rely
on frame-level information only. Our approach begins with a frame-level
prediction module which estimates framewise action classes via a transformer
encoder. The frame-level prediction module is trained in an unsupervised manner
via temporal optimal transport. To exploit segment-level information, we
utilize a segment-level prediction module and a frame-to-segment alignment
module. The former includes a transformer decoder for estimating video
transcripts, while the latter matches frame-level features with segment-level
features, yielding permutation-aware segmentation results. Moreover, inspired
by temporal optimal transport, we introduce simple-yet-effective pseudo labels
for unsupervised training of the above modules. Our experiments on four public
datasets, i.e., 50 Salads, YouTube Instructions, Breakfast, and Desktop
Assembly show that our approach achieves comparable or better performance than
previous methods in unsupervised activity segmentation.Comment: Accepted to WACV 202
Comparative Design Space Exploration of Dense and Semi-Dense SLAM
SLAM has matured significantly over the past few years, and is beginning to
appear in serious commercial products. While new SLAM systems are being
proposed at every conference, evaluation is often restricted to qualitative
visualizations or accuracy estimation against a ground truth. This is due to
the lack of benchmarking methodologies which can holistically and
quantitatively evaluate these systems. Further investigation at the level of
individual kernels and parameter spaces of SLAM pipelines is non-existent,
which is absolutely essential for systems research and integration. We extend
the recently introduced SLAMBench framework to allow comparing two
state-of-the-art SLAM pipelines, namely KinectFusion and LSD-SLAM, along the
metrics of accuracy, energy consumption, and processing frame rate on two
different hardware platforms, namely a desktop and an embedded device. We also
analyze the pipelines at the level of individual kernels and explore their
algorithmic and hardware design spaces for the first time, yielding valuable
insights.Comment: IEEE International Conference on Robotics and Automation 201
Urosepsis: Flow is Life
Urosepsis is one of the important etiological factors for community as well as hospital-acquired infections. Accordingly, urosepsis is divided into community-acquired and hospital-acquired urosepsis. Obstruction to the flow of urine is a common risk factor for community-acquired urosepsis, whereas the indwelling urinary catheter is the risk for the hospital-acquired urosepsis. E. coli remained the most common bacteria-causing urosepsis. If not treated early and appropriately, urosepsis can complicate into septic shock and multiple organ dysfunction. The cornerstone for the improved outcome of these patients is initial resuscitation and proper antibiotic therapy and restoring the flow of urine or removing the infected urinary catheter. Community-acquired urosepsis can be prevented by removing the obstruction to flow of urine permanently. The hospital-acquired urosepsis can be prevented by strictly following catheter-associated urinary tract infection prevention bundle and removing the catheter as early as possible
Integrating Algorithmic Parameters into Benchmarking and Design Space Exploration in 3D Scene Understanding
System designers typically use well-studied benchmarks to evaluate and improve new architectures and compilers. We design tomorrow's systems based on yesterday's applications. In this paper we investigate an emerging application, 3D scene understanding, likely to be signi cant in the mobile space in the near future. Until now, this application could only run in real-time on desktop GPUs. In this work, we examine how it can be mapped to power constrained embedded systems. Key to our approach is the idea of incremental co-design exploration, where optimization choices that concern the domain layer are incrementally explored together with low-level compiler and architecture choices. The goal of this exploration is to reduce execution time while minimizing power and meeting our quality of result objective. As the design space is too large to exhaustively evaluate, we use active learning based on a random forest predictor to nd good designs. We show that our approach can, for the rst time, achieve dense 3D mapping and tracking in the real-time range within a 1W power budget on a popular embedded device. This is a 4.8x execution time improvement and a 2.8x power reduction compared to the state-of-the-art
Introducing SLAMBench, a performance and accuracy benchmarking methodology for SLAM
Real-time dense computer vision and SLAM offer great potential for a new
level of scene modelling, tracking and real environmental interaction for many
types of robot, but their high computational requirements mean that use on mass
market embedded platforms is challenging. Meanwhile, trends in low-cost,
low-power processing are towards massive parallelism and heterogeneity, making
it difficult for robotics and vision researchers to implement their algorithms
in a performance-portable way. In this paper we introduce SLAMBench, a
publicly-available software framework which represents a starting point for
quantitative, comparable and validatable experimental research to investigate
trade-offs in performance, accuracy and energy consumption of a dense RGB-D
SLAM system. SLAMBench provides a KinectFusion implementation in C++, OpenMP,
OpenCL and CUDA, and harnesses the ICL-NUIM dataset of synthetic RGB-D
sequences with trajectory and scene ground truth for reliable accuracy
comparison of different implementation and algorithms. We present an analysis
and breakdown of the constituent algorithmic elements of KinectFusion, and
experimentally investigate their execution time on a variety of multicore and
GPUaccelerated platforms. For a popular embedded platform, we also present an
analysis of energy efficiency for different configuration alternatives.Comment: 8 pages, ICRA 2015 conference pape